Skip to content

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13

Closed
ldemon2333 wants to merge 1 commit intoAnnaSuSu:mainfrom
ldemon2333:main
Closed

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13
ldemon2333 wants to merge 1 commit intoAnnaSuSu:mainfrom
ldemon2333:main

Conversation

@ldemon2333
Copy link
Copy Markdown

@ldemon2333 ldemon2333 commented Apr 7, 2026

Summary

Add an MCTS (Monte Carlo Tree Search) dynamic strategy engine to the interview copilot, enabling real-time strategy optimization during mock interviews.

Changes

New Modules (backend/copilot/)

  • mcts_config.pyMCTSConfig, MCTSNode, StrategyRecommendation data structures
  • reward_model.pyRewardModel with cosine similarity scoring: R(S) = W1·Match_JD + W2·Safe - W3·Risk
  • simulation_engine.py — 3-level degradation rollout simulator (LLM → lightweight LLM → pure reward)
  • mcts_engine.py — Full MCTS 4-step engine (Select/Expand/Simulate/Backprop) with PUCT selection

Modified Files

  • config.py — 11 new mcts_* settings (feature-flagged, disabled by default)
  • llm_provider.pyget_mcts_rollout_llm() for simulation
  • main.py — Integration into copilot WebSocket session as async background task

Frontend

  • frontend/src/hooks/useCopilotStream.js — Add strategy_recommendation case to WebSocket message switch, ensuring MCTS search results are forwarded to the UI via onUpdate callback (without this the backend pushes the message but the frontend silently drops it)

Bug Fixes

  • ASR 启动逻辑修复: NLS SDK start() 返回 None 而非 truthy 值,改用 try/except
  • WebSocket 断开时 MCTS cleanup: finally 块中增加 mcts.stop() 调用,防止搜索 Task 写入已关闭的 WS
  • 候选人回答后不再触发多余 MCTS 搜索: 搜索仅在 HR 发言时触发
  • _try_merge_static 变量命名: _matched_node_id(实际使用不应为 throwaway)
  • 展开深度使用配置值: 新增 max_expansion_depth 替代硬编码 3
  • embedding 调用不阻塞 event loop: asyncio.to_thread() 包装同步 API

Docs and Tests

  • docs/mcts-strategy.md — User-facing feature documentation
  • docs/SUMMARY.md — Updated index
  • tests/test_mcts_engine.py — 39 unit tests covering all modules

Key Design Decisions

  • Feature-flagged: MCTS_ENABLED=false by default, zero impact when disabled
  • PUCT variant: AlphaGo-style selection with LLM confidence as prior, c_puct=1.4
  • Pure numpy: No heavy ML dependencies, less than 10ms per reward evaluation
  • Graceful degradation: Falls back to reward-only evaluation if LLM rollout fails

Copy link
Copy Markdown
Owner

@AnnaSuSu AnnaSuSu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体设计不错——博弈建模思路清晰,模块拆分干净,feature flag 零侵入,降级策略也考虑到了。以下几个问题需要先修一下:


Bug(必须修)

1. _get_weak_points 永远返回空列表

mcts_engine.py:506prep_state.get("profile", {})weak_points,但 prep_result 里没有 "profile" 这个 key。候选人画像不在 prep state 里。需要改成从 fit_report.get("gaps", []) 读取,或者在 _init_mcts_engine 时把 profile 传进 prep_state。

2. WebSocket 断开时 MCTS 引擎没有 cleanup

main.pyfinally 块只清理了 ASR,没调 mcts.stop()。断连后搜索 Task 会继续跑然后尝试 ws.send_json() 到已关闭的 WebSocket。需要加上:

finally:
    if session and session.get("asr"):
        session["asr"].shutdown()
    if session and session.get("mcts_engine"):
        await session["mcts_engine"].stop()
    _copilot_sessions.pop(session_id, None)

3. 候选人回答后触发 MCTS 搜索逻辑有问题

main.pyon_candidate_response 之后又 create_task(_run_mcts_and_push),但此时根节点仍然是上一轮 HR 的问题。候选人已经回答了,再在旧根上搜候选人策略没意义。建议:

  • 去掉候选人回答后的 MCTS 触发
  • 或者改成以候选人回答为新根,搜索预测 HR 下一步追问

需要你确认一下这里的设计意图。


建议改进

4. _try_merge_static_ 做变量名但实际在用

_, static_intent, score = self.navigator.match_utterance(...)
static_node = self.navigator.get_node(_)

_ 按惯例是 throwaway,这里实际当 node_id 用,建议改名。

5. 展开深度硬编码

_run_iterationleaf.depth < 3 是硬编码的,config 里有 rollout_depth 但没用上。建议用配置值或单独加个 max_expansion_depth

6. get_text_embedding() 同步调用阻塞 event loop

_expandon_hr_utterance 里直接调 embed.get_text_embedding(),如果用的是 API embedding 会阻塞。建议 asyncio.to_thread() 包一下。


修完 1-3 后再看一轮,其他的不阻塞合入。

@ldemon2333
Copy link
Copy Markdown
Author

ldemon2333 commented Apr 8, 2026 via email

@AnnaSuSu
Copy link
Copy Markdown
Owner

AnnaSuSu commented Apr 8, 2026

补一条,第 1 条我收回,是我 review 错了。

刚才重新 trace 了下 prep_result 的构造,copilot_prep.py:207 返回的 dict 里确实有 "profile"(line 211),来自 memory.get_profile(user_id),里面也是带 weak_points 的(memory.py 那边一直在维护这个字段)。你测试没复现是对的,_get_weak_points 能正常拿到数据,这块不用改。

我之前凭印象说"prep_result 里没有 profile key",没 trace 到源头,抱歉。

另外提醒一下,你 PR 里后端加了 strategy_recommendation 这个消息类型,但 frontend/src/hooks/useCopilotStream.js 的 switch 里没加对应的 case,前端会默默把这条消息丢掉,面板拿不到 MCTS 的搜索结果。合入前记得带上前端的改动,不然整条链路是不通的。

其他几条修完一起推上来,我再过一遍。

…lout simulation

- Add MCTSConfig, MCTSNode, StrategyRecommendation data structures
- Add RewardModel with cosine similarity scoring (R = W1·Match + W2·Safe - W3·Risk)
- Add SimulationEngine with 3-level degradation rollout
- Add MCTSEngine with PUCT selection, LLM expansion, backpropagation
- Integrate MCTS into copilot WebSocket session (feature-flagged, off by default)
- Add 11 mcts_* settings to config and rollout LLM provider
- Add user-facing docs and 39 unit tests
@AnnaSuSu
Copy link
Copy Markdown
Owner

@ldemon2333 先谢谢你为这个 PR 投入的工作 —— 模块拆分(mcts_config / mcts_engine / reward_model / simulation_engine)很干净,feature flag 默认关闭零侵入,3 级降级策略设计得很合理,39 个单测覆盖也到位。第一轮 Review 提的几个问题你都跟进修了,代码质量本身完全够合入的线。

但斟酌之后,我决定这个 PR 先不合入,理由不在代码质量,而在项目当前阶段:

Copilot 的实时链路目前还处在稳定期。最近几周主链路本身还在持续收敛 —— ASR 启动逻辑、WebSocket 生命周期、候选人/HR 发言的触发语义、Intent Classifier → Answer Coach → Interview Monitor → HR Profiler 这条管线的各层接口都还没完全沉淀下来。此时在上面再叠一层博弈推演模块,意味着每一次对主链路的调整都要同步考虑 MCTS 这边的耦合点(prep_state 结构、StrategyTreeNavigator 接口、WebSocket 消息类型、_init_copilot_session 的初始化顺序等)。当前阶段的维护成本会超过这个特性带来的边际收益。

这是一个 scope 和 timing 的决策,不是对你工作的否定。等 Copilot 核心链路稳定下来、并且我们把"动态策略"这个方向确认进 roadmap 之后,你这个分支里的思路(尤其是 reward_model 的 embedding 打分和 simulation engine 的降级策略)是很好的起点。建议你把分支留在自己的 fork 里,以后重新捡起来也方便。

再次感谢你的贡献,也抱歉让你跑了一轮完整的 review cycle。

@AnnaSuSu AnnaSuSu closed this Apr 14, 2026
@ldemon2333
Copy link
Copy Markdown
Author

ldemon2333 commented Apr 14, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants